Crate sync_executor

Expand description

§sync_executor

This is a crate that provides an async executor which you can use to execute a future that you know will never return Pending when you poll it.

async fn add(a: u64, b: u64) -> u64 {
    return a + b;
}

fn main() {
    // Poll the future returned by `add`, and error if this returns `Poll::Pending`.
    let result = sync_executor::block_on(add(2, 2)).unwrap();
    assert_eq!(result,  4);
}

If the future passed to block_on() doesn’t resolve immediately, then block_on() will return an error.

This crate can be used to allow you to write async functions that safely make blocking calls, because we know that no other tasks will be waiting, so we know we won’t block the executor.

§Wait… Why would I ever want this?

Let’s suppose you’re trying to write a library, and you want to provide both an async and a blocking API. There are a number of ways you could go about this:

Copy paste all your code.
Use a macro like maybe_async or bisync. maybe_async is very popular, but doesn’t handle the “library” case very well because it can’t generate both async and blocking code in the same compile. bisync handles this, but isn’t very popular.
Use sans-io.

These all have upsides and downsides. Copy-pasting code works, and gives you quite a bit of control, but every time you need to fix a bug you’re going to have to fix it in multiple places. There are popular macro solutions to this like maybe_async, but maybe_async will only generate async code or blocking code, not both at the same time, so you’ll run into problems with it because of feature unification.

Sans-io works, but the problem with sans-io specifically - as noted by James and Amos - is that you have to hand craft this complicated state machine, and why do that when the whole point of async is to get the compiler to build complicated state machines for you?

But this got me thinking; could I do “sans-io”… with async? Basically the idea is to use rust’s compiler to turn async code into a state machine, and then provide an async and blocking implementation via monomorphization. We can’t await an async function from inside blocking code, but we can call into a blocking function from inside an async function. “But wait!”, you say, “You can’t call blocking functions from inside an async function! Isn’t that bad?” Well, actually, that’s not quite true. It’s a bad idea to call a blocking function from inside an async function because if there are other tasks running, you may block the executor and prevent those tasks from running. There’s no preemtive multitasking here, so we have to rely on async functions yielding back to the executor from time to time. But what if we were guaranteed there were no other tasks running? If we were running our task on a thread by itself, then it’d be safe to call into blocking functions. In fact, if we only called blocking functions, our “state machine” would resolve immediately. As soon as we poll our future it would return Poll::Ready.

And this is where sync_executor comes in. async_executor is extremely lightweight, having no dependencies at all. The only other executor that comes close is pollster.

§A Concrete Example

We’re going to write a toy example library that downloads a text file, but if there’s any kind of error, we’ll wait for a second and then try again until we succeed. Here’s what a simple “copy-paste” example of our library might look like:

use std::time::Duration;

pub async fn get_text_file_async(file_url: &str) -> String {
    loop {
        let response = reqwest::get(file_url).await;

        if let Ok(response) = response
            && response.status().is_success()
            && let Ok(text) = response.text().await
        {
            return text;
        } else {
            tokio::time::sleep(Duration::from_secs(1)).await;
        }
    }
}

pub fn get_text_file_blocking(file_url: &str) -> String {
    loop {
        let response = ureq::get(file_url).call();

        if let Ok(response) = response
            && response.status().is_success()
            && let Ok(text) = response.into_body().read_to_string()
        {
            return text;
        } else {
            std::thread::sleep(Duration::from_secs(1));
        }
    }
}

Immediately we notice the “structure” of these two functions is very similar at a high level. Note though that there are some minor differences which come about because we’re using reqwest and tokio in the async version, and the blocking HTTP client ureq and the standard library in the blocking version. We call sleep in both, but it’s actually a different implementation of sleep in each case.

To use sync_executor to refactor this, first we want to identify the common parts (making the HTTP request and reading the response, and sleeping) and extract these into a trait. Then we can use this trait to write a “shared” version of our function that we can call into from both the blocking and async versions:

use std::time::Duration;

/// `System` is a trait that encapsulates all the parts of our program to do IO
/// or wait or otherwise are different between our async and blocking versions.
trait System {
    async fn get_text(url: &str) -> anyhow::Result<String>;
    async fn sleep(duration: Duration);
}

/// This is the "shared" part of our code; the business logic that's the same
/// between the blocking and async version.
async fn get_text_file_common<A: System>(file_url: &str) -> String {
    loop {
        let text = A::get_text(file_url).await;
        if let Ok(text) = text {
            return text;
        } else {
            A::sleep(Duration::from_secs(1)).await;
        }
    }
}

We’re using anyhow for error handling here, because it make the code a little easier to read, and that’s not really the “interesting part”. Writing the async version is pretty straight forward. No surprises here:

struct ReqwestSystem;

impl System for ReqwestSystem {
    async fn get_text(url: &str) -> anyhow::Result<String> {
        let response = reqwest::get(url).await?;
        if !response.status().is_success() {
            anyhow::bail!(response.status().to_string());
        }
        Ok(response.text().await?)
    }

    async fn sleep(duration: Duration) {
        tokio::time::sleep(duration).await;
    }
}

pub async fn get_text_file_async(file_url: &str) -> String {
    // Call into the shared function.
    get_text_file_common::<ReqwestSystem>(file_url).await
}

And then we come to the blocking side, which is where things get interesting:

struct UreqSystem;

impl System for UreqSystem {
    async fn get_text(url: &str) -> anyhow::Result<String> {
        let response = ureq::get(url).call()?;
        if !response.status().is_success() {
            anyhow::bail!(response.status().to_string());
        }
        Ok(response.into_body().read_to_string()?)
    }

    async fn sleep(duration: Duration) {
        std::thread::sleep(duration);
    }
}

pub fn get_text_file_blocking(file_url: &str) -> String {
    // Call into the shared function.
    sync_executor::block_on(get_text_file_common::<UreqSystem>(file_url))
        .expect("get_text_file_common should never be pending")
}

The idea here is that in the “blocking” case, we’ll only call into blocking functions, in which case the Future returned by get_text_file_common should never return pending when polled. (If you really need to be able to handle the pending case for some reason, have a look at pollster, another extremely lightweight executor). The whole “state machine” will resolve in a single call in the blocking case.

To spell out the invariant you need to uphold a little more formally; your “shared” async functions must only ever call into async functions in traits, or into other “shared” async functions. You should only use traits that do not call into blocking code from your public async APIs, and you must only use traits that never call into async code from your blocking APIs.

In practice, when I’ve used this in larger projects, I’ve created a module called maybe_async which exports all the various traits I need to implement, and then I have a module named feat that has all the concrete implementations of those traits, hidden behind feature flags.

§Why is this awesome?

If you have a medium sized library with some complicated logic, this makes it so you don’t have to duplicate that logic in two places. It does a fairly good job of DRYing up your code.

This strategy also gives you quite a lot of control. If you’re using a library like bisync to generate blocking and async versions, that’s fine so long as you just want to create one blocking version and one async version. Because we’re free to implement our trait as many times as we like, it’d be trivial to take the example above and implement the trait again using the reqwest blocking client, to give our users the choice of using reqwest or ureq. Or we could implement the trait again using async-std and surf for those users who really don’t like tokio for whatever reasons. You could even expose these traits to your end users to let them provide their own implementations.

§Why is this not-so-awesome?

Every solution to a problem has upsides and downsides, and this solution is no different.

First, it’s weird. Once it’s explained to you, it hopefully makes sense. But if a developer who hasn’t had this explained to them stumbles onto your code then they’re going to wonder what’s going on.

Second, especially if you’re that developer that’s stumbled into this, it’s easy to make a mistake and call a normal async function from the “shared” code, which would result in a runtime error. This isn’t so different from regular async code - there’s this rule in “normal” async code where you’re not supposed to call blocking functions from an async context, and those problems don’t get caught by the compiler either, so in a sense this isn’t any different from normal async code, it’s just that the rules here are a little different. But, because it’s weird (see point one above) it’s more likely contributors to your project won’t know those rules.

Enums§

Error

Functions§

block_on: Runs a future to completion. The given future is treated as a simple state machine. It will be executed entirely on the current thread, and this function will not return until the future completes.